perm filename VIS[0,BGB]2 blob
sn#066768 filedate 1973-10-17 generic text, type C, neo UTF8
COMMENT ā VALID 00012 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 2.1 Introduction to Computer Vision Theory.
C00004 00003 2.2 A Preview of My Computer Vision Theory.
C00009 00004
C00011 00005 2.3 Computer Vision and Artificial Intelligence.
C00016 00006 "The design, implementation, and use of the robot hardware
C00019 00007 2.4 The Vision Loop.
C00021 00008 2.5 The Nature of Images.
C00023 00009 2.6 Locus Solving.
C00024 00010 2.7 Recognition.
C00025 00011 2.8 The Nature of Worlds.
C00027 00012 2.9 Related Vision Work.
C00030 ENDMK
Cā;
2.1 Introduction to Computer Vision Theory.
In this chapter, two levels of theory are interleaved. There
is a general theory, which is my interpretation of the overall state
of computer vision; and there is a particular theory, which has
inspired this work.
The word "theory", as used here, means simply a concise body
of statements presenting a systematic view of a subject;
Specifically, I wish to exclude the connotations that the theory is
a mathematical theory or a natural theory; and so it must be an
"artificial theory" or in more conventional terms it is a philosophy
of computer vision. The formal statements of the theory will be
made in {BOLDFACE} and will be either definitions, notions or
assumptions.
The validity of a natural theory is tested by experiment; a
mathematical theory is judged valid by its consistency; an
artificial theory is validated by the successful design and
production of the intended artifact. However, when there is a need
to compare unvalidated theories; the grounds for such comparison are
the usual means of verbal discourse: analogy, anecdote, scenerio,
philosophical arguments and plausible reasoning.
2.2 A Preview of My Computer Vision Theory.
(vision and AI). Given a computer with several television
cameras, two mechanical arms and a radio controled cart; the
overall problem is to write a program that can see and act
intelligently with respect to the physical world. In my opinion,
the A.I. topics relevant to such robot programs are vision, world
modeling, and goal seeking; while only minor roles are played by
language, logic, and problem solving.
(the vision loop). Computer vision is the inverse of
computer graphics. The problem of computer graphics is to synthesis
images from three dimensional models; the problem of computer vision
is to analyze images into three dimensional models. The overall
major structure of a general purpose computer vision system is that
of a "feedback" loop between 2-D images and a 3-D world model.
Depending on circumstances, the vision loop should be able to run
almost entirely top-down (verification vision) or bottom-up
(revelation vision). Verification vision is all that is required in
a well know and consquently predictible environment; whereas
revelation vision is required in a brand new or rapidly changing
environment.
(the nature of images). There are three basic kinds of
information in a 2-D visual image: photometric, geometric, and
topological; also there are four kinds of 2-D images: raster,
contour, mosaic, and feature. The traditional subject of image
processing involves the study and development of programs that
enhance, transform and compare 2D images. Nearly all such image
processing work can be subsumed into computer vision.
(locus solving). The crux of computer vision however is to
deduce information about the world being viewed from images of that
world. I believe that the world information most directly relevant
is the physical location, extent and light scattering properties of
solid opaque objects; the location, orientation and scales of the
cameras that takes the pictures; and the location and nature of the
lights that illuminate the world. Accordingly, three central themes
of my theory are body locus solving, camera solving, and sun
solving. The macroscopic world doesn't change very rapidly; between
any two world states there is an intermediate world state. Parallax
is the principal means of depth perception. Parallax is the
alchemist that converts 2-D images into 3-D models. Revelation vision
is a process of comparing percieved images taken in sequence and
constructing a 3-D model of the unanticipated objects.
(recognition). Recognition involves comparing perceived data
with predicted data; such recognition comparing can be done on any
of the four types of 2-D images or the 3-D models. Arcane recognition
techniques can be avoided by improving the prediction and the
analysis so that matchs are nearly obvious.
(the nature of worlds). The rules about the world that can
be assumed a priori by a programmer are the laws of physics;
programming a simulation of the mundane physical world to a given
approximation is difficult
The remainder of this chapter is devoted to elaborating and
defending this theory.
2.3 Computer Vision and Artificial Intelligence.
At one extreme, computer vision may be discribed as merely
the problem of getting visual input hardware properly connected to a
computer; once the computer can "see" a raster of intensities in its
memory, the rest of the problem is artificial intelligence. The
other extreme is harder to depict because it requires figuring where
to draw the line between vision software and intelligence software.
Notion: The Top-Down and Bottom-Up in Computer Vision.
The vision sensor hardware is the "bottom";
visual software and intelligence is the "top".
Normal vision should not be an Artificial Intelligence
problem in the sense that it will not involve searching a large
space of possibilities or of solving an abstract problems.
"The history of progress in the development of systems for automatic
symbolic integration poses an interesting question about the
definition of artificial intelligence. Few would argue that Slagle's
SAINT program was a product of artificial intelligence research.
Moses' SIN program for symbolic integration seldom needed to resort
to search, and for this reason some people consider it much more
powerful (intelligent ?) than SAINT. Now, Risch (1969) has developed
an algorithm for integrating many types of expressions. Risch
considers himself a mathematician, not an artificial intelligence
researcher. In your opinion should Risch's algorithm be considered
part of the subject matter of artificial intelligence ? If you would
exclude Risch from artifial intelligence, how would you respond to
the statement that every artificial intelligence program might
eventually be dominated by a (more intelligent?) non artificial
intelligence algorithm? If you would include Risch, would you also
include the long-division algorithm?"
- Nils J. Nilsson, problem 4-5;
Problem-Solving Methods in Artificial Intelligence.
(Intellectual Entities). The larger context of a vision
theory depends on ones' opinion about the nature of counscious
intelligent animals, men and robots. It is my opinion that mind is
to matter, as computer software is to computer hardware. That is
mind is a program that is running in the brain. Well now, what
software can account for counsciousness, the inner private life of
the self that burns in our heads ? The so called stream of
counsciousness consists of little voice(s) talking, fragments of
music playing, and most important there is the flow of the here and
now. The "here-and-now" is the totality of the particular sights,
sounds, smells, and so on that are being played in your head in
sync with the respective sensory stimuli. So I believe that
the major computation being performed by an intellectual entity in
order to stay counscious of its external world is a reality
simulation.
"The design, implementation, and use of the robot hardware
presents some difficult, and often expensive, engineering and
maintenance problems. If one is to work in this area solving such
problems is a necessary prelude but, more often than not,
unrewarding because the activity does not address the questions of
A.I. reseach that motivate the project. Why, then, build devices?
Why not simulate them and their environment? In fact, the SRI group
has done good work in simulating a version of their robot in a
simplified environment. The answer given is as follows. It is felt
by the SRI group that the most unsatisfactory part of their
simulation effort was the simulation of the environment. Yet, they
say that 90% of the effort of the simulation team went into this
part of the simulation. It turned out to be very difficult to
reproduce in an internal representation for a computer the necessary
richness of environment that would give rise to interesting behavior
by the highly adaptive robt. It is easier and cheaper to build a
hardware robot to extract what information it needs from the real
world than to organize and store a useful model. Crudely put, the
SRI group's argument is that the most economic and efficient store
of information about the real world is the real world itself."
- E. A. Fiegenbaum [ref. X].
2.4 The Vision Loop.
Assumption: The overall structure of a general purpose computer
vision system is that of a "feedback" loop
between 2-D images and a 3-D world model.
Alternatives: 1. Computer vision is structured like a compiler.
2. Computer vision is structured in terms of
discrimination functions.
Assumption: Computer vision is both top down and bottom up.
Alternatives: 1. Computer vision is mostly top down.
2. Computer vision is mostly bottom up.
Computer vision is the inverse of computer graphics. The problem of
computer graphics is to synthesis images from three dimensional
models; the problem of computer vision is to analyze images into
three dimensional models.
Vision loop terminolgy...............................................
1. PREDICT 2D ā 3D synthesis Verification
2. PERCEIVE 3D ā 2D analysis Revelation
3. COMPARE recognition
Discription of nearly pure top down vision...........................
Discription of nearly pure bottom up vision..........................
2.5 The Nature of Images.
Assumption: Computer vision based on digitized television images.
Alternatives: 1. Active 3-D imaging device.
2. Non-light devices: sound, radar, neutrinoes, etc.
Although, a super intellectual entities would have eyes that could see
the whole electromagnetic spectrum from gamma radiation to direct current
as well as "voices" that could broadcast on any and all frequency.
Notion: An image contains three basic kinds of data:
topological data, geometric data, and photometric data.
2.X A Notion of Computer Vision.
Assumption: I will use a real computer capible of
taking real images of the real world.
Alternatives: 1. ...use a real computer and simulated images.
2. ...think about using a computer.
3. Study biological vision systems.
2.6 Locus Solving.
2.6.1 Camera Locus Solving.
2.6.2 Body Locus Solving.
Silhouette Cone Intersection.
Envelope bodies.
2.6.3 Sun Locus Solving.
(compute it, look at it, shine and shadows).
2.7 Recognition.
2.8 The Nature of Worlds.
Assumption: The world model should be a 3-D geometric model.
Alternatives: 1. Image memory and 2-D models.
2. Procedual Knowledge.
3. Semantic knowledge.
4. Formal Logic models.
(On Partial Knowledge).
Assumption: Partial knowledge should be represented by approxination.
Alternatives: 1. Tree of possibilties.
2. Multi valued logic.
3. Probablities.
(Alternate world models).
(Reality Simulation).
"For the purpose of presenting my argument I must first explain the
basic premise of sorcery as don Juan presented it to me. He said
that for a sorcerer, the world of everyday life is not real, or out
there, as we believe it is. For a sorcerer, reality or the world we
all know, is only a discription. For the sake of validating this
premise don Juan concentrated the best of his efforts into leading
me to a genuine conviction that what I held in mind as the world at
hand was merely a description of the world; a description that had
been pounded into me from the moment I was born."
- Carlos Castaneda. Journey to Ixtlan.
2.9 Related Vision Work.
Stanford Hand/Eye
SRI - hart & duda.
MIT Guzman, Waltz